Source-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation
نویسندگان
چکیده
In this paper, a previous work on the enlargement of monolingual dictionaries of rule-based machine translation systems by non-expert users is extended to tackle the complete task of adding both source-language and target-language words to the monolingual dictionaries and the bilingual dictionary. In the original method, users validate whether some suffix variations of the word to be inserted are correct in order to find the most appropriate inflection paradigm. This method is now improved by taking advantage from the strong correlation detected between paradigms in both languages to reduce the search space of the target-language paradigm once the source-language paradigm is known. Results show that, when the source-language word has already been inserted, the system is able to more accurately predict which is the right target-language paradigm, and the number of queries posed to users is significantly reduced. Experiments also show that, when the source language and the target language are not closely related, it is only the source-language part-of-speech category, but not the rest of information provided by the source-language paradigm, which helps to correctly classify the target-language word.
منابع مشابه
Multimodal Building of Monolingual Dictionaries for Machine Translation by Non-Expert Users
This paper explores a new approach to help non-expert users with no background in linguistics to add new words to a monolingual dictionary in a rule-based machine translation system. Our method aims at obtaining the correct paradigm which explains not only the particular surface form introduced by the user, but also the rest of inflected forms of the word. An initial set of potential paradigms ...
متن کاملComparison of SYSTRAN and Google Translate for English→ Portuguese
Two machine translation (MT) systems, a statistical MT (SMT) system and a hybrid system (rule-based and SMT) were tested in order to compare various MT performances. The source language was English (EN) and the target language Portuguese (PT). The SMT tool gave much fewer errors than the hybrid system. Major problem areas of both systems concerned the transfer of verb systems from source to tar...
متن کاملTowards a Thesaurus of Predicates
We propose a thesaurus of predicates that can help to resolve pre-editing and/or post-editing problems in machine translation environments. It differs from earlier approaches such as conventional dictionaries in that we are aiming to link a wide range of near-synonyms and paraphrases. We are compiling such similar examples through both introspection and the use of translation data, giving us a ...
متن کاملGeneration of Bilingual Dictionaries using Comparable and Quasi Comparable Corpora
The amount of information available on the web is increasing rapidly. The number of internet users is also increasing every day. A significant section of internet users is monolingual. They want to express themselves in their native language and also seeking information in the same. Hence, multilingual content over the internet is also increasing at a rapid pace. There is a need of systems whic...
متن کاملPapillon Lexical Database Project: Monolingual Dictionaries & Interlingual Links
This paper presents a new research and development project called Papillon. It started as a French-Japanese cooperation between laboratories GETA/CLIPS (Grenoble, France) and NII (Tokyo, Japan). Its goal is to build a multilingual lexical database and to extract from it digital bilingual dictionaries. The database is built with monolingual dictionaries, one for each language of the database, li...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012